nonlinear function
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
Provable Subspace Identification Under Post-Nonlinear Mixtures
Unsupervised mixture learning (UML) aims at identifying linearly or nonlinearly mixed latent components in a blind manner. UML is known to be challenging: Even learning linear mixtures requires highly nontrivial analytical tools, e.g., independent component analysis or nonnegative matrix factorization. In this work, the post-nonlinear (PNL) mixture model---where {\it unknown} element-wise nonlinear functions are imposed onto a linear mixture---is revisited. The PNL model is widely employed in different fields ranging from brain signal classification, speech separation, remote sensing, to causal discovery. To identify and remove the unknown nonlinear functions, existing works often assume different properties on the latent components (e.g., statistical independence or probability-simplex structures).
Secret mixtures of experts inside your LLM
Despite being one of the earliest neural network layers, the Multilayer Perceptron (MLP) is arguably one of the least understood parts of the transformer architecture due to its dense computation and lack of easy visualization. This paper seeks to understand the MLP layers in dense LLM models by hypothesizing that these layers secretly approximately perform a sparse computation -- namely, that they can be well approximated by sparsely-activating Mixture of Experts (MoE) layers. Our hypothesis is based on a novel theoretical connection between MoE models and Sparse Autoencoder (SAE) structure in activation space. We empirically validate the hypothesis on pretrained LLMs, and demonstrate that the activation distribution matters -- these results do not hold for Gaussian data, but rather rely crucially on structure in the distribution of neural network activations. Our results shine light on a general principle at play in MLP layers inside LLMs, and give an explanation for the effectiveness of modern MoE-based transformers. Additionally, our experimental explorations suggest new directions for more efficient MoE architecture design based on low-rank routers.
- North America > United States > Pennsylvania (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
Overcoming error-in-variable problem in data-driven model discovery by orthogonal distance regression
Despite the recent proliferation of machine learning methods like SINDy that promise automatic discovery of governing equations from time-series data, there remain significant challenges to discovering models from noisy datasets. One reason is that the linear regression underlying these methods assumes that all noise resides in the training target (the regressand), which is the time derivative, whereas the measurement noise is in the states (the regressors). Recent methods like modified-SINDy and DySMHO address this error-in-variable problem by leveraging information from the model's temporal evolution, but they are also imposing the equation as a hard constraint, which effectively assumes no error in the regressand. Without relaxation, this hard constraint prevents assimilation of data longer than Lyapunov time. Instead, the fulfilment of the model equation should be treated as a soft constraint to account for the small yet critical error introduced by numerical truncation. The uncertainties in both the regressor and the regressand invite the use of orthogonal distance regression (ODR). By incorporating ODR with the Bayesian framework for model selection, we introduce a novel method for model discovery, termed ODR-BINDy, and assess its performance against current SINDy variants using the Lorenz63, Rossler, and Van Der Pol systems as case studies. Our findings indicate that ODR-BINDy consistently outperforms all existing methods in recovering the correct model from sparse and noisy datasets. For instance, our ODR-BINDy method reliably recovers the Lorenz63 equation from data with noise contamination levels of up to 30%.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.48)
- Research Report > Promising Solution (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization
Wang, Aotao, Shao, Haikuo, Ma, Shaobo, Wang, Zhongfeng
State Space Models (SSMs), like recent Mamba2, have achieved remarkable performance and received extensive attention. However, deploying Mamba2 on resource-constrained edge devices encounters many problems: severe outliers within the linear layer challenging the quantization, diverse and irregular element-wise tensor operations, and hardware-unfriendly nonlinear functions in the SSM block. To address these issues, this paper presents FastMamba, a dedicated accelerator on FPGA with hardware-algorithm co-design to promote the deployment efficiency of Mamba2. Specifically, we successfully achieve 8-bit quantization for linear layers through Hadamard transformation to eliminate outliers. Moreover, a hardware-friendly and fine-grained power-of-two quantization framework is presented for the SSM block and convolution layer, and a first-order linear approximation is developed to optimize the nonlinear functions. Based on the accurate algorithm quantization, we propose an accelerator that integrates parallel vector processing units, pipelined execution dataflow, and an efficient SSM Nonlinear Approximation Unit, which enhances computational efficiency and reduces hardware complexity. Finally, we evaluate FastMamba on Xilinx VC709 FPGA. For the input prefill task on Mamba2-130M, FastMamba achieves 68.80\times and 8.90\times speedup over Intel Xeon 4210R CPU and NVIDIA RTX 3090 GPU, respectively. In the output decode experiment with Mamba2-2.7B, FastMamba attains 6\times higher energy efficiency than RTX 3090 GPU.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Information Technology (0.49)
- Semiconductors & Electronics (0.35)
- Information Technology > Hardware (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Optimizing Basis Function Selection in Constructive Wavelet Neural Networks and Its Applications
Huang, Dunsheng, Shen, Dong, Lu, Lei, Tan, Ying
Wavelet neural network (WNN), which learns an unknown nonlinear mapping from the data, has been widely used in signal processing, and time-series analysis. However, challenges in constructing accurate wavelet bases and high computational costs limit their application. This study introduces a constructive WNN that selects initial bases and trains functions by introducing new bases for predefined accuracy while reducing computational costs. For the first time, we analyze the frequency of unknown nonlinear functions and select appropriate initial wavelets based on their primary frequency components by estimating the energy of the spatial frequency component. This leads to a novel constructive framework consisting of a frequency estimator and a wavelet-basis increase mechanism to prioritize high-energy bases, significantly improving computational efficiency. The theoretical foundation defines the necessary time-frequency range for high-dimensional wavelets at a given accuracy. The framework's versatility is demonstrated through four examples: estimating unknown static mappings from offline data, combining two offline datasets, identifying time-varying mappings from time-series data, and capturing nonlinear dependencies in real time-series data. These examples showcase the framework's broad applicability and practicality. All the code will be released at https://github.com/dshuangdd/CWNN.
- North America > United States (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Asia > China > Beijing > Beijing (0.04)
- (3 more...)
- Education > Educational Setting (0.46)
- Health & Medicine (0.46)
Causal rule ensemble approach for multi-arm data
Wan, Ke, Tanioka, Kensuke, Shimokawa, Toshio
Heterogeneous treatment effect (HTE) estimation is critical in medical research. It provides insights into how treatment effects vary among individuals, which can provide statistical evidence for precision medicine. While most existing methods focus on binary treatment situations, real-world applications often involve multiple interventions. However, current HTE estimation methods are primarily designed for binary comparisons and often rely on black-box models, which limit their applicability and interpretability in multi-arm settings. To address these challenges, we propose an interpretable machine learning framework for HTE estimation in multi-arm trials. Our method employs a rule-based ensemble approach consisting of rule generation, rule ensemble, and HTE estimation, ensuring both predictive accuracy and interpretability. Through extensive simulation studies and real data applications, the performance of our method was evaluated against state-of-the-art multi-arm HTE estimation approaches. The results indicate that our approach achieved lower bias and higher estimation accuracy compared with those of existing methods. Furthermore, the interpretability of our framework allows clearer insights into how covariates influence treatment effects, facilitating clinical decision making. By bridging the gap between accuracy and interpretability, our study contributes a valuable tool for multi-arm HTE estimation, supporting precision medicine.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Research Report > Strength High (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
MixFunn: A Neural Network for Differential Equations with Improved Generalization and Interpretability
Farias, Tiago de Souza, de Lima, Gubio Gomes, Maziero, Jonas, Villas-Boas, Celso Jorge
We introduce MixFunn, a novel neural network architecture designed to solve differential equations with enhanced precision, interpretability, and generalization capability. The architecture comprises two key components: the mixed-function neuron, which integrates multiple parameterized nonlinear functions to improve representational flexibility, and the second-order neuron, which combines a linear transformation of its inputs with a quadratic term to capture cross-combinations of input variables. These features significantly enhance the expressive power of the network, enabling it to achieve comparable or superior results with drastically fewer parameters and a reduction of up to four orders of magnitude compared to conventional approaches. We applied MixFunn in a physics-informed setting to solve differential equations in classical mechanics, quantum mechanics, and fluid dynamics, demonstrating its effectiveness in achieving higher accuracy and improved generalization to regions outside the training domain relative to standard machine learning models. Furthermore, the architecture facilitates the extraction of interpretable analytical expressions, offering valuable insights into the underlying solutions.
- South America > Brazil (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Overview (0.92)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
I overall maintain my general sentiment about the paper and look forward to more discussion of this work's relationship to traditional neural / deep networks. They apply this model to text documents as a "deep belief topic model" (my own phrasing). They derive an inference / update step, in which dirichlet vectors are propagated up the network and gamma weights are propagated back down it, and perform an empirical evaluation of the model. Overall this paper was written fairly clearly. I thought the paper was mostly fine, but I believe that the empirical analysis could be improved (more on that shortly).